On Utilization and Importance of Perl Status Reporter (SRr) in Text Mining

نویسندگان

  • Sugam Sharma
  • Tzusheng Pei
  • Hari Cohly
چکیده

In Bioinformatics, text mining and text data mining sometimes interchangeably used is a process to derive high-quality information from text. Perl Status Reporter (SRr) [1] is a data fetching tool from a flat text file and in this research paper we illustrate the use of SRr in text/data mining. SRr needs a flat text input file where the mining process to be performed. SRr reads input file and derives the high-quality information from it. Typically text mining tasks are text categorization, text clustering, concept and entity extraction, and document summarization. SRr can be utilized for any of these tasks with little or none customizing efforts. In our implementation we perform text categorization mining operation on input file. The input file has two parameters of interest (firstKey and secondKey). The composition of these two parameters describes the uniqueness of entries in that file in the similar manner as done by composite key in database. SRr reads the input file line by line and extracts the parameters of interest and form a composite key by joining them together. It subsequently generates an output file consisting of the name as firstKey_secondKey. SRr reads the input file and tracks the composite key. It further stores all that data lines, having the same composite key, in output file generated by SRr based on that composite key.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Generation of Firewall Log Status Reporter (SRr) Using Perl

Computer System Administration and Network Administration are few such areas where Practical Extraction Reporting Language (Perl) has robust utilization these days apart from Bioinformatics. The key role of a System/Network Administrator is to monitor log files. Log file are updated every day. To scan the summary of large log files and to quickly determine if there is anything wrong with the se...

متن کامل

Perl Status Reporter (SRr) on Spatiotemporal Data Mining

Perl Status Reporter (SRr) [1] is a data mining tool and in this research work we illustrate the use of SRr on spatio-temporal database. Spatio–temporal data are associated with time and space and used vastly in different applications of diversified areas such as geography, geology, city planning, agriculture, environmental study, traffic navigation, aerospace industries, and so on. We exploit ...

متن کامل

Designing a System for Trend Analysis of Users in Website Surfing in Iran Using Data Mining and Text Mining Algorithms

Background and Aim: As of the entrance of web surfing to the lifestyle of a vast majority of people in the society and the need for a more accurate social and cultural policy making in the field, authors intended to analyze the behavior of the society users in viewing different websites so as to help politicians and practitioners. Methods: Design science research method is used in this research...

متن کامل

BioC implementations in Go, Perl, Python and Ruby

As part of a communitywide effort for evaluating text mining and information extraction systems applied to the biomedical domain, BioC is focused on the goal of interoperability, currently a major barrier to wide-scale adoption of text mining tools. BioC is a simple XML format, specified by DTD, for exchanging data for biomedical natural language processing. With initial implementations in C++ ...

متن کامل

Mining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)

As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1001.3277  شماره 

صفحات  -

تاریخ انتشار 2010